PTMs and half-lives
It is assumed that each protein only has one modification. The proteins with no modifications are not identified.
Protein half-lives for short lived proteins can be found here: Proteome-wide mapping of short-lived proteins in human cells - ScienceDirect (e-bronnen.be)
Protein half-lives of long-lived proteins can be found here: Systematic analysis of protein turnover in primary cells | Nature Communications
Check that phosphorylation is the most abundant (literature).
Proteins with a short half-life
Proteins can have varying half-lives.
Below is a comparison of the distribution of the half-lives that was found in literature and the distribution of a subset of those half-lives in the proteins found in the dataset. (no outliers were removed yet)
Outliers
I want to remove the proteins with a very high number of log10(counts_norm_abund_len).
Detecting the outliers:
[1] "Dimensions BEFORE removing the outliers"
[1] 181235 18
[1] "Dimensions AFTER removing the outliers"
[1] 180165 18
The outliers have been removed now. What is the resulting distribution?
Looking in more detail
What are the most modified proteins?
mod <- human_ptms_hl_short %>%
group_by(Uniprot_entry_name) %>%
summarise(sum = sum(counts_norm_abund_len)) %>%
arrange(desc(sum))
mod[1:5,]# A tibble: 5 × 2
Uniprot_entry_name sum
<chr> <dbl>
1 EAPP_HUMAN 0.190
2 RASF3_HUMAN 0.178
3 MTBP_HUMAN 0.177
4 CH088_HUMAN 0.171
5 NUFP1_HUMAN 0.170
Looking at a particular half-life range:
#write.table(df$Uniprot_entry_name, file = '/Users/anastasialinchik/Downloads/proteins.tsv', row.names = F, sep="\t", quote = F)PTMs
PTMs of interest:
PTMs that control autophagy
phosphorylation
ubiquitination -> need to use the new dataset
acetylation
oxPTMs
- you have a list of these
Methylation eg of histones
K Acylation -> need to get this from this paper
AGEs as markers of ageing
Phosphorylation
This is already without outliers
- Only the modification [21]Phospho is present here.
Splitting the dataset in a group with phosphorylation proteins and another group with all remaining proteins.
It is not necessary to include another density line with all of the proteins. You can just compare the two distributions.
Comparison
[1] "Non-modified"
[1] 252 4
[1] "Modified"
[1] 914 4
Testing whether the half-lives between groups are significantly different. Wilcoxon test (note that the sample sizes are uneven). The p value was adjusted using the formula p*sqrt(N/100), where N = n1+ n2.
Design-based KruskalWallis test
data: mean_hl_hours ~ mod_group
t = 2.4341, df = 1164, p-value = 0.01508
alternative hypothesis: true difference in mean rank score is not equal to 0
sample estimates:
difference in mean rank score
0.09386352
[1] "adjusted p-value (Good's Bayes adjustment)"
gPhosphorylated
0.05149134
[1] 1166 4
Acetylation
- Filtered by the [1]Acetyl modification.
[1] "Non-modified"
[1] 616 4
[1] "Modified"
[1] 550 4
Design-based KruskalWallis test
data: mean_hl_hours ~ mod_group
t = 3.5568, df = 1164, p-value = 0.0003905
alternative hypothesis: true difference in mean rank score is not equal to 0
sample estimates:
difference in mean rank score
0.1638661
[1] "Adjusted p-value"
gAcetylated
0.001333409
[1] 1166 4
Ubiquitination
Ubiquitination has the classification ‘Other’. Take that as one group. The second group is all of the PTMs. 890 proteins overlap so you have 289 proteins taht are not ubiquitinated and have PTMs and we know their half-lives. These make up the second group.
[1] "Non-modified"
[1] 278 4
[1] "Modified"
[1] 898 4
Design-based KruskalWallis test
data: mean_hl_hours ~ mod_group
t = -2.3096, df = 1174, p-value = 0.02108
alternative hypothesis: true difference in mean rank score is not equal to 0
sample estimates:
difference in mean rank score
-0.09965387
[1] "Adjusted p-value"
gUbiquitinated
0.07230265
[1] 1176 4
Methylation
- Filtered by the [34]Methyl modification
Violin plots
[1] "Non-modified"
[1] 391 4
[1] "Modified"
[1] 775 4
Design-based KruskalWallis test
data: mean_hl_hours ~ mod_group
t = 0.90091, df = 1164, p-value = 0.3678
alternative hypothesis: true difference in mean rank score is not equal to 0
sample estimates:
difference in mean rank score
0.04207529
[1] "Adjusted p-value"
gMethylated
1.256003
[1] 1166 4
oxPTMs
This is only for proteins that are related to ageing.
All PTMs related to oxidative damage in general, not only oxidation.
[1] "Non-modified"
[1] 19 4
[1] "Modified"
[1] 1171 4
Design-based KruskalWallis test
data: mean_hl_hours ~ mod_group
t = 1.5091, df = 1188, p-value = 0.1315
alternative hypothesis: true difference in mean rank score is not equal to 0
sample estimates:
difference in mean rank score
0.1974355
[1] "Adjusted p-value"
goxPTMs
0.4537599
[1] 1190 4
Lysine acylations
Violin plot
[1] "Non-modified"
[1] 707 4
[1] "Modified"
[1] 465 4
Design-based KruskalWallis test
data: mean_hl_hours ~ mod_group
t = 0.49822, df = 1170, p-value = 0.6184
alternative hypothesis: true difference in mean rank score is not equal to 0
sample estimates:
difference in mean rank score
0.03815802
[1] "Adjusted p-value"
gK acylation
2.117143
[1] 1172 4
AGEs
Violin plots
[1] "Non-modified"
[1] 902 4
[1] "Modified"
[1] 269 4
Design-based KruskalWallis test
data: mean_hl_hours ~ mod_group
t = -1.0866, df = 1169, p-value = 0.2774
alternative hypothesis: true difference in mean rank score is not equal to 0
sample estimates:
difference in mean rank score
-0.07799565
[1] "Adjusted p-value"
gAGEs
0.9494111
[1] 1171 4
Binning
Hypothesis: The higher the half-life, the greater the number of PTMs.
Phosphorylation
oxPTMs
methylation
Ubiquitination, acetylation, lysine, AGEs
General:
Broken down by the modifications
Check the number of proteins in each bin.
# A tibble: 5 × 2
hl_group protein_count
<chr> <int>
1 0-5 234
2 10-15 256
3 15-20 182
4 20+ 286
5 5-10 226
oxPTMs + phospho
`summarise()` has grouped output by 'hl_group'. You can override using the
`.groups` argument.
# A tibble: 15 × 3
# Groups: hl_group [5]
hl_group mod_group protein_count
<chr> <chr> <int>
1 0-5 - 233
2 0-5 Phosphorylated 165
3 0-5 oxPTMs 227
4 10-15 - 253
5 10-15 Phosphorylated 207
6 10-15 oxPTMs 254
7 15-20 - 181
8 15-20 Phosphorylated 146
9 15-20 oxPTMs 184
10 20+ - 285
11 20+ Phosphorylated 235
12 20+ oxPTMs 285
13 5-10 - 220
14 5-10 Phosphorylated 161
15 5-10 oxPTMs 221
Proteins with a long half-life
Long-lived proteins can be used as estimators of chronological age. Long-lived proteins can be defined in different ways, for example based on the half-life of the protein when compared to the average half-life of proteins in the organism. In this case, long-lived proteins were obtained from the following study: paper. Proteins were classified as long-lived based on their degree of degradation during the experiment and therefore it was possible to discover new long-lived proteins (no a priori assumptions were made).
The study identified a list of long-lived proteins in rats, therefore human orthologs of these proteins were found.
Plot the data distributions
Outliers
[1] "Dimensions BEFORE removing the outliers"
[1] 2228928 18
[1] "Dimensions AFTER removing the outliers"
[1] 2131902 18
All of the outliers have been removed.
Check the distribution of the half-lives:
Also need to remove the proteins with very large half-lives (The proteins with short half-lives were not removed even if there were identified as outliers):
[1] "Dimensions BEFORE removing the outliers"
[1] 2131902 18
[1] "Dimensions AFTER removing the outliers"
[1] 2063092 18
Now the exact same thing but for `human_complete_hl_long`
(Note that the dataset with the original data still has the proteins with very large half-lives so the scale needs to be based on the subset, not the original set)
Looking in more detail
What are the most modified proteins?
mod <- human_ptms_hl_short %>%
group_by(Uniprot_entry_name) %>%
summarise(sum = sum(counts_norm_abund_len)) %>%
arrange(desc(sum))
mod[1:5,]# A tibble: 5 × 2
Uniprot_entry_name sum
<chr> <dbl>
1 EAPP_HUMAN 0.190
2 RASF3_HUMAN 0.178
3 MTBP_HUMAN 0.177
4 CH088_HUMAN 0.171
5 NUFP1_HUMAN 0.170
Looking at a particular half-life range:
#write.table(df$Uniprot_entry_name, file = '/Users/anastasialinchik/Downloads/proteins.csv', seprow.names = FALSE, quote = FALSE)PTMs
Phosphorylation
Violin plot:
[1] "Non-modified"
[1] 157 4
[1] "Modified"
[1] 2336 4
Design-based KruskalWallis test
data: mean_hl_hours ~ mod_group
t = -0.90735, df = 2491, p-value = 0.3643
alternative hypothesis: true difference in mean rank score is not equal to 0
sample estimates:
difference in mean rank score
-0.03385422
[1] "Adjusted p-value"
gPhosphorylated
1.818986
[1] 2493 4
Acetylation
Violin plot:
[1] "Non-modified"
[1] 263 4
[1] "Modified"
[1] 2230 4
Design-based KruskalWallis test
data: mean_hl_hours ~ mod_group
t = 5.7916, df = 2491, p-value = 7.849e-09
alternative hypothesis: true difference in mean rank score is not equal to 0
sample estimates:
difference in mean rank score
0.2002254
[1] "Adjusted p-value"
gAcetylated
3.918759e-08
[1] 2493 4
Ubiquitination
Violin plot
[1] "Non-modified"
[1] 143 4
[1] "Modified"
[1] 2412 4
Design-based KruskalWallis test
data: mean_hl_hours ~ mod_group
t = 0.36469, df = 2553, p-value = 0.7154
alternative hypothesis: true difference in mean rank score is not equal to 0
sample estimates:
difference in mean rank score
0.01750828
[1] "Adjusted p-value"
gUbiquitinated
3.615996
[1] 2555 4
Methylation
[1] "Non-modified"
[1] 77 4
[1] "Modified"
[1] 2416 4
Design-based KruskalWallis test
data: mean_hl_hours ~ mod_group
t = 3.0915, df = 2491, p-value = 0.002014
alternative hypothesis: true difference in mean rank score is not equal to 0
sample estimates:
difference in mean rank score
0.1590465
[1] "Adjusted p-value"
gMethylated
0.0100538
[1] 2493 4
oxPTMs
Violin plot:
[1] "Non-modified"
[1] 2493 4
[1] "Modified"
[1] 2644 4
Lysine acylations
[1] "Non-modified"
[1] 257 4
[1] "Modified"
[1] 2369 4
Design-based KruskalWallis test
data: mean_hl_hours ~ mod_group
t = 4.7064, df = 2624, p-value = 2.652e-06
alternative hypothesis: true difference in mean rank score is not equal to 0
sample estimates:
difference in mean rank score
0.1983194
[1] "Adjusted p-value"
gK acylation
1.358869e-05
[1] 2626 4
AGEs
Violin plots:
[1] "Non-modified"
[1] 599 4
[1] "Modified"
[1] 2006 4
Design-based KruskalWallis test
data: mean_hl_hours ~ mod_group
t = 4.4658, df = 2603, p-value = 8.315e-06
alternative hypothesis: true difference in mean rank score is not equal to 0
sample estimates:
difference in mean rank score
0.1279503
[1] "Adjusted p-value"
gAGEs
4.244034e-05
[1] 2605 4
Binning
oxPTMs: